Modeling segmental durations for Japanese text-to-speech synthesis

نویسندگان

  • Jennifer J. Venditti
  • Jan P. H. van Santen
چکیده

Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of segmental duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of both consonants and vowels. A Sum-of-Products approach is used to model key interactions observed in the data, and to predict values of factor combinations not found in the speech database. We report overall observed-predicted correlations of 0.88 for vowels (RMSdev: 16.8ms) and 0.94 for consonants (RMSdev: 12.5ms).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling vowel duration for Japanese text-to-speech synthesis

Accurate estimation of segmental durations is crucial for naturalsounding text-to-speech (TTS) synthesis. This paper presents a model of vowel duration used in the Bell Labs Japanese TTS system. We describe the constraints on vowel devoicing, and effects of factors such as phone identity, surrounding phone identities, accentuation, syllabic structure, and phrasal position on the duration of bot...

متن کامل

Modeling segmental duration in German text-to-speech synthesis

This paper reports on the construction of a model for segmental duration in German. The model predicts the durations of speech sounds in various textual, prosodic, and segmental contexts. It has been implemented in the German version of the Bell Labs text-tospeech system [18, 12]. The construction of the duration system was made efficient by the use of an interactive statistical analysis packag...

متن کامل

Duration modeling for hindi text-to-speech synthesis system

This paper reports preliminary results of data-driven modeling of segmental (phoneme) duration for Hindi. Classification and Regression Tree (CART) based datadriven duration modeling for segmental duration prediction is presented. A number of features are considered and their usefulness and relative contribution for segmental duration prediction is assessed. Objective evaluation of the duration...

متن کامل

Duration Modeling For Turkish Text-to-Speech Synthesis System

Naturalness of synthetic speech depends on appropriate modeling of prosodic aspects. Mostly, three prosody components are modeled: segmental duration, pitch contour and intensity. In this study, we present our work on modeling segmental duration in Turkish by using machine-learning algorithms. The models predict phone durations based on attributes such as phone identity, neighboring phone ident...

متن کامل

CART-based duration modeling using a novel method of extracting prosodic features

The prediction of accurate segmental durations remains a difficult problem when synthesising speech from text. Inaccurate durations are often perceptually prominent and detract from the naturalness of the quality of speech. For a concatenative system, a statistical approach is an excellent way of predicting segmental durations. More specifically a CART (Classification And Regression Tree) metho...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998